Multi-Level Audio Classification Architecture
نویسندگان
چکیده
منابع مشابه
Multi-level Attention Model for Weakly Supervised Audio Classification
In this paper, we propose a multi-level attention model to solve the weakly labelled audio classification problem. The objective of audio classification is to predict the presence or absence of audio events in an audio clip. Recently, Google published a large scale weakly labelled dataset called Audio Set, where each audio clip contains only the presence or absence of the audio events, without ...
متن کاملMulti-timescale Pmscs for Music Audio Classification
Principal mel-spectrum components (PMSCs) [3] are computed several timescales in parallel [2]. For each timescale, the feature extraction involves four steps: discrete Fourier transform (DFT), mel-scaling, principal component analysis whitening (PCA) and temporal pooling. Firstly, for each timescale, we compute discrete Fourier transforms over a given time length. To compute PMSCs at different ...
متن کاملHigh-level feature weighted GMM network for audio stream classification
The problem of unsupervised audio classification continuous to be a challenging research problem which significantly impacts ASR and Spoken Document Retrieval (SDR) performance. This paper addresses novel advances in audio classification for speech recognition. A new algorithm is proposed for audio classification, which is based on Weighted GMM Network (WGN). Two new high-level features: VSF (V...
متن کاملBlock-Level Audio Features for Music Genre Classification
While frame-level audio features, e.g. MFCCs, in combination with the bag-of-frames approach have widely and successfully been used, we use a block processing framework in our submission. In general block-level features have the advantage that they can capture more temporal information than BOF approaches can. We introduce two novel spectral patterns, closely related to the spectrum histogram a...
متن کاملRaw Waveform-based Audio Classification Using Sample-level CNN Architectures
Music, speech, and acoustic scene sound are often handled separately in the audio domain because of their different signal characteristics. However, as the image domain grows rapidly by versatile image classification models, it is necessary to study extensible classification models in the audio domain as well. In this study, we approach this problem using two types of sample-level deep convolut...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Advances in Electrical and Electronic Engineering
سال: 2015
ISSN: 1804-3119,1336-1376
DOI: 10.15598/aeee.v13i4.1454